OLAP with a Database Cluster
نویسنده
چکیده
This chapter presents a new approach to online decision support systems that is scalable, fast, and capable of analysing up-to-date data. It is based on a database cluster: a cluster of commercial off-the-shelf computers as hardware infrastructure and off-the-shelf database management systems as transactional storage managers. We focus on central architectural issues and on the performance implications of such a cluster-based decision support system. In the first half, we present a scalable infrastructure and discuss physical data design alternatives for cluster-based online decision support systems. In the second half of the chapter, we discuss query routing algorithms and freshness-aware scheduling. This protocol enables users to seamlessly decide how fresh the data analysed should be by allowing for different degrees of freshness of the online analytical processing (OLAP) nodes. In particular it becomes then possible to trade freshness of data for query performance. INtrODUctION Online analytical processing (OLAP) systems must cope with huge volumes of data and at the same time must allow for short response times to facilitate interactive usage. They must also be capable to scale, meaning to be easily extensible with the increasing data volumes accumulated. Furthermore, the requirement that the data analysed should be up-to-date is becoming more and more important. However, not only are these contrary requirements, but they also run counter to the performance needs of the day-to-day business. Most OLAP systems nowadays are kept separated from mission critical systems. This means that they offer a compromise between “up-todateness,” that is, freshness (or currency) of data, and query response times. The data needed are propagated into the OLAP system on a regular basis, preferably when not slowing down dayto-day business, for example, during nights or weekends. OLAP users have no alternative but to analyse stale data.
منابع مشابه
Olap Query Processing in Grids * * Work Partially Funded by Capes-cofecub (daad Project), Cnpq-inria (gridata Project), French Anr Massive Data (respire Project) and the European Strep Grid4all Project
OLAP query processing is critical for enterprise grids. Capitalizing on our experience with the ParGRES database cluster, we propose a middleware solution, GParGRES, which exploits database replication and interand intra-query parallelism to efficiently support OLAP queries in a grid. GParGRES has been partially implemented as database grid services on Grid5000. We give preliminary experimental...
متن کاملIntegrating Clustering Techniques and OLAP Methodologies: The ClustCube Approach
In this paper, we introduce ClustCube, an innovative OLAP-based framework for clustering and mining complex database objects extracted from distributed database settings. To this end, ClustCube puts together conventional clustering techniques and well-consolidated OLAP methodologies in order to achieve higher expressive power and mining effectiveness over traditional methodologies for mining tu...
متن کاملHigh-Performance Query Processing of a Real-World OLAP Database with ParGRES
Typical OLAP queries take a long time to be processed so speeding up the execution of each single query is imperative to decision making. ParGRES is an open-source database cluster middleware for high performance OLAP query processing. By exploiting intra-query parallelism on PC clusters, ParGRES has shown excellent performance using the TPC-H benchmark. In this paper, we evaluate ParGRES on a ...
متن کاملOnline Analytical Processing (OLAP): A Fast and Effective Data Mining Tool for Gene Expression Databases
Gene expression databases contain a wealth of information, but current data mining tools are limited in their speed and effectiveness in extracting meaningful biological knowledge from them. Online analytical processing (OLAP) can be used as a supplement to cluster analysis for fast and effective data mining of gene expression databases. We used Analysis Services 2000, a product that ships with...
متن کاملApuama: Combining Intra-query and Inter-query Parallelism in a Database Cluster
Database clusters provide a cost-effective solution for high performance query processing. By using either interor intra-query parallelism on replicated data, they can accelerate individual queries and increase throughput. However, there is no database cluster that combines interand intra-query parallelism while supporting intensive update transactions. C-JDBC is a successful database cluster t...
متن کاملSISYPHUS: A Chunk-Based Storage Manager for OLAP Cubes
In this paper, we present SISYPHUS, a storage manager for data cubes that provides an efficient physical base for performing OLAP operations. On-Line Analytical Processing (OLAP) poses new requirements to the physical storage layer of a database management system. Special characteristics of OLAP cubes such as multidimensionality, hierarchical structure of dimensions, data sparseness, etc., are ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009